Search CORE

4 research outputs found

REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models

Author: Hussain Shehzeen Samarah
Koushanfar Farinaz
Neekhara Paarth
Zhang Ruisi
Publication venue
Publication date: 18/10/2023
Field of study

We present REMARK-LLM, a novel efficient, and robust watermarking framework designed for texts generated by large language models (LLMs). Synthesizing human-like content using LLMs necessitates vast computational resources and extensive datasets, encapsulating critical intellectual property (IP). However, the generated content is prone to malicious exploitation, including spamming and plagiarism. To address the challenges, REMARK-LLM proposes three new components: (i) a learning-based message encoding module to infuse binary signatures into LLM-generated texts; (ii) a reparameterization module to transform the dense distributions from the message encoding to the sparse distribution of the watermarked textual tokens; (iii) a decoding module dedicated for signature extraction; Furthermore, we introduce an optimized beam search algorithm to guarantee the coherence and consistency of the generated content. REMARK-LLM is rigorously trained to encourage the preservation of semantic integrity in watermarked content, while ensuring effective watermark retrieval. Extensive evaluations on multiple unseen datasets highlight REMARK-LLM proficiency and transferability in inserting 2 times more signature bits into the same texts when compared to prior art, all while maintaining semantic integrity. Furthermore, REMARK-LLM exhibits better resilience against a spectrum of watermark detection and removal attacks

arXiv.org e-Print Archive

Recommended from our members

Robust and Efficient Deep Learning for Multimedia Generation and Recognition

Author: Hussain Shehzeen Samarah
Publication venue: eScholarship, University of California
Publication date: 01/01/2023
Field of study

Deep Neural Networks (DNNs) have transformed the field of multimedia generation and recognition by replacing traditional hand-engineered systems in domains like vision, speech and text. This is because DNNs can operate end-to-end and model complex dependencies yielding state-of-the-art results on several generation and recognition benchmarks. However, there are three key challenges that need to be addressed for the practical, secure and reliable deployment of DNN-based media processing systems: 1) Robustness: DNNs are vulnerable to adversarial attacks, 2) Data-Requirement: DNNs often require large amounts of labelled data, 3) Compute-Efficiency: DNNs require extensive compute and resources.My research focuses on addressing the above three challenges of DNN based multimedia generation and recognition systems. On the robustness side, I first analyze practical vulnerabilities of DNN-based recognition systems and then propose a robust defense framework that can reliably identify adversarial inputs using perceptually informed input transformations. To address the challenge of data-requirement, I develop training frameworks that can effectively adapt foundation models trained using self-supervised learning for recognition and synthesis tasks in a data-efficient manner. Finally, to address the challenge of compute-efficiency, I propose acceleration methods using hardware-software codesign that significantly reduce the latency and resource-requirement while preserving the synthesis quality of DNN generators

eScholarship - University of California

Robust and Efficient Deep Learning for Multimedia Generation and Recognition

Author: Hussain Shehzeen Samarah
Publication venue
Publication date: 01/01/2023
Field of study

Ezid

Robust and Efficient Deep Learning for Multimedia Generation and Recognition

Author: Hussain Shehzeen Samarah
Publication venue
Publication date: 01/01/2023
Field of study

Ezid